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O ■ 

^ i Abstract 

O ■ 

. A random search algorithm intended to solve discrete optimization 

^"^ | problems is considered. We outline the main components of the algo- 

rithm, and then describe it in more detail. We show how the algorithm 
can be implemented on parallel computer systems. A performance 
analysis of both serial and parallel versions of the algorithm is given, 
' and related results of solving test problems are discussed. 

f"*^ ' Key- Words: global optimization, random search, parallel algorithms, 

^ . performance analysis of algorithms. 

■ 1 Introduction 

> : 

Together with other global optimization techniques [H [21 O S], including 
Simulated Annealing, evolutionary algorithms, and the Tunneling method, 
\ random search presents a powerful approach to solving discrete optimization 

' problems when the objective function is too complex to obtain the solution 

t— I ■ analytically, or does not have an appropriate analytical representation. One 

can consider the functions with their values being obtained as a response 
• • ' from a controllable real-time process, or being evaluated through computer 

. ^ ■ simulation. The optimization problems become even more difficult to solve if 

^ , the evaluation of the function presents a very time-consuming procedure as is 

normally the case when the function is determined via computer simulation. 

To solve the difficult optimization problems above, we propose a ran- 
dom search technique based on the Branch and Probability Bound (BPB) 
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approach introduced in [5] and further developed in [SI HI 13- The BPB ap- 
proach actually combines the usual branch and bound search scheme with 
statistical procedures of parameter estimation based on random sampling. 

The key feature of the BPB approach is that it allows one to examine sev- 
eral regions within a feasible set concurrently in a natural way. Therefore, 
when solving multiextremal optimization problems, BPB algorithms nor- 
mally offer an advantage over other global optimization techniques, which 
concentrate the search only on a single feasible region, and so could easily 
miss the solution. 

If the sampling procedure including evaluation of the objective function 
at the sample points takes much more time than the core part of a search 
algorithm, it is quite natural to arrange the procedure so that it could work 
in parallel. 

In this paper, we present a BPB random search algorithm together with 
its parallel implementation. A performance analysis of the parallel imple- 
mentation is given based on solution of some test problems. As our compu- 
tational experience shows, the parallel algorithm has a quite good potential 
to speedup the solution time when evaluation of the objective function is 
time-consuming. 

2 Problem and Solution Approach 

We consider the problem of finding 

= argmin/(x), 

where X is a discrete feasible set, and / is a real- valued function. As 
examples of X , one can consider the set of integer vectors x — • • • , ^n) 
with their components X{ 6 {1, . . . , m} for each i = 1, . . . , n, or the set of 
all permutations from the permutation group of order n. 

It is assumed that the function value f(x) is available for each point 
x from the feasible set X . However, we deal with the problems when the 
function itself may not have an analytical representation as it is usually the 
case in the analysis of outcome of actual real-time processes or output of 
computer simulation runs. Note that in the last case, the evaluation of the 
function may be a time-consuming procedure. 

To solve the problems, we propose a global random search algorithm 
based on the BPB approach. As in the standard branch and bound scheme, 
the BPB approach involves partitioning the feasible set into subsets fol- 
lowed by choosing the subsets most promising for the solution. However, it 
assumes both partitioning and determining the subsets for further search to 
be performed on the basis of some statistical procedures. 

As with many other adaptive random search techniques, the BPB al- 
gorithms actually employ random sampling with both the feasible set and 
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the sample probability distribution over the set being modified with each 
new iteration designed to exploit information about the function behavior, 
obtained in the course of the previous search. 

The BPB approach offers a natural and an efficient technique to control 
the search, based on a statistical procedure which estimates the prospec- 
tiveness of each subset for further consideration. The procedure evaluates 
a criterion based on sample data, which has a two-fold implementation. It 
allows one to reduce the feasible set by removing the subsets that have a 
low criterion, and so could hardly contain the solution. 

On the other hand, evaluation of the criterion plays the key role in 
rebuilding the sample distribution. In fact, the new distribution is defined 
in such a way that it provides for more intensive sampling resulting in more 
promising subsets with higher values of the prospectiveness criterion. 



3 Basic Components of the Algorithm 

In this section, we outline the basic concepts and components of the BPB 
random search algorithm. 



3.1 Prospectiveness Criterion 

Evaluation of prospectiveness of a subset for further search provides the 
basis of BPB algorithms. Consider a prospectiveness criterion introduced in 
[5] (see also HE]). 

Let Z C X be a subset of the feasible set X , E = {xi, . . . ,xk} be a 
sample from a probability distribution P{dx) over X , and is the mini- 
mum value of the function / over H : 

y* = min/(a;). 

Assuming that H n Z ^ , one can evaluate y = f(x) for each x E SnZ 
to obtain a sample T = {y%, . . . , yjy} , where N = |S Pi Z\ is the cardinality 
of EnZ , and define y^ < ■ ■ ■ < yrm to be the ordered statistics associated 
with T. 

The prospectiveness criterion for the subset Z is defined as 

V \y(k+i) -y*J ) 

where A: is a positive integer number, and a is a positive real parameter. 

As it has been shown in [5], the criterion has a natural statistical inter- 
pretation. If k — >• oo and k 2 /N — > as N — > oo, then tps(Z) converges to 
the probability that 

min fix) < y*. 
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In practice, the value of k can be set according to the following condi- 
tions. If N > 10, then one can take 

f [N/10], if N < 100, 
~\ 10, if JV > 100; 

otherwise, one has to expand the sample 5 until N = |S n Z\ > 10, and 
then to try to evaluate the criterion once again. 

The parameter a is actually determined by the behavior of the function 
/ on the entire feasible set X , and it is normally unknown. To estimate a , 
suppose that ym < ■ ■ ■ < yrm are ordered statistics corresponding to the 
entire sample 5 over X . It can be shown [S] , that the estimate 

S = ln5 An *W ( 2) 

/ V{m+l) ~ 2/(1) 

converges to a, if k — > oo, k 2 /N — > 0, and m/k — > 0.2 as N — >• oo. To 
evaluate a, one normally takes k = 10, m = 2, and N > 100. 

Note that the above asymptotic results have been initially obtained un- 
der the assumption that the feasible set X is a compact subset of an Eu- 
clidean space. However, as our computational experience shows (see, e.g. 
[6]), the related practical recommendations still work well when solving op- 
timization problems with discrete feasible sets. 

3.2 Representation of the Feasible Set 

At each iteration of the algorithm, the current feasible set X is represented 
as X = Z\ U • • • U Zk, where Zj , j = 1, . . . , k, are subsets of a common 
simple structure. The basic subset type, hyperballs or hypercubes with 
respect to a metric p are normally taken to provide for efficient partitioning 
and sampling procedures. Since for some discrete spaces (e.g., permutation 
groups), the concept of a hypercube is not appropriate, we restrict ourselves 
to hyperballs 

B r (z,p) = {x\p(z,x) < r}, 

where r is the radius, and z is a center. 

Starting with a hyperball Z = B r (z,p) of a radius r = R, where R is 
large enough to cover the initial set X at the first iteration, the algorithm 
consecutively decrements the radius of hyperballs with every new iteration 
so as to allow for reduction of the feasible set and thereby concentrating the 
search on more promising subsets. 

3.3 Reduction and Partition of the Sets 

The reduction procedure is based on the partitioning of the current feasible 
set X into a subset Z and its complement X \ Z . 
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In order to decide, if the complement can be removed, the procedure 
first evaluates its related criterion ([1]) to get 7 = ip® (X \ Z) , where is the 
set of all sample points currently available. If the value of 7 appears to be 
less than a fixed low bound 5 , which determines the lowest level for subsets 
to be considered as candidates for further search, then the complement is 
removed. 

The procedure actually combines reduction of the feasible set and par- 
tition of the reduced set into subsets, and can be described in more detail 
as follows. Suppose that r is the common radius of hyperballs, and 5 is the 
low bound for the criterion (pQ). Let us define 

z\ = argmin/(x), Z 1 = B r (zi,p), 

and consider the value of 71 = ip® (X \ Z\ ) . 

If 71 < 5, then the subset X \ Z\ can be removed since it has a very 
low prospectiveness level. Otherwise, when 71 > S, the procedure has to be 
continued. Now we take 

z 2 = arg min f(x), Z 2 = B r (z 2 ,p). 

After evaluation of 72 = tpe(X \ [Z\ U Z 2 )) the procedure may be con- 
tinued or ended depending on the value of 72 . If continued, the procedure 
is repeated as long as there is a subset to remove. 

It may appear that there are not enough sample points available to 
evaluate the criterion. In this case, one has to stop the procedure, go back 
to extend the sample O , and then start the procedure from the beginning. 

Suppose that the procedure is repeated k times before meeting the con- 
dition of removing a subset. Upon completion of the procedure, we have the 
current feasible set X reduced to the union Z\ U • • • U Z^ , and the current 
set of sample points reduced to O n (Z\ U • • • U Z^) . 

3.4 Sample Probability Distribution 

To make a decision on how to reduce the current feasible set, the algorithm 
implements a statistical criterion based on random sampling over the set. 
This makes the sampling procedure a key component of the algorithm. The 
procedure applies a probability distribution, which is first set to the uniform 
distribution over the initial feasible set X , and then modified with each new 
iteration. 

Suppose that the current set X is formed by k subsets (hyperballs): 
X = Z\ U • • • U Z). . The distribution P(dx) over X can be defined as a 
superposition of a probability distribution over the set of hyperballs and the 
uniform distribution over each hyperball. With a probability pj assigned to 
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the hyperball Zj , we have 



P(dx) = ^pjQjidx), 



where Qj(dx) denotes the uniform distribution over Zj for each j = 1, . . . , k. 

The algorithm sets probabilities Pi, ■ ■ ■ ,Pk in proportion to the criterion 
(PQ) determined by their related hyperballs. In this case, the probabilities 
actually control the search, allowing the algorithm to put more new sample 
points into the hyperballs with higher probabilities. 

In order to get the probabilities, one can evaluate qj = tpo,(Zj) for each 
j = 1, . . . ,k, and then take 



Note that there may be not enough sample points available when eval- 
uating qj . In this case, it is quite natural to set qj = 5 . If it appears that 
all qj equal 0, one can set qj = 1 for every j = l,...,k. 

4 Random Search BPB Algorithm 

Now we summarize the ideas described above in the presentation of the 
entire search algorithm. 

The algorithm actually offers both global and local search capabilities. 
With each new iteration of the global search, the algorithm decrements 
the radius of the hyperballs by 1 until the radius achieves 1 . All further 
iterations are performed with the radius fixed at 1 until a local minimum 
is found. We consider the best sample point found as a local minimum if 
all its nearest neighbors have already been examined, and are so included in 
the current set of sample points. 

Algorithm 1. 

Step 1. Fix values for K , R and 5. Set i = 1, r$ = R, To = 0, X\ = X, 
and P\{dx) to be the uniform distribution over X\ . 

Step 2. Get a sample H« = {x( , . . . , '} from Pi(dx). For each x € Hj, 
evaluate f(x) . 

Step 3. Set 0j = Fj_i U Sj , and find 




m=l 



yi' = min f(x), 



x* = arg min f(x). 
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Step 4. If i = 1, then evaluate a with ([2]). 

Step 5. Put rj = max{rj„i — 1, 1}. 

Step 6. If n = 1 and B x {xf,p) C 9;, then STOP. 

Step 7. Set fc = 1, = 0. 

Step 8. Find zfi = arg min f{x). 

Step 9. Set Z® = B r .{z%\p), U® = uj?\ U Z® . 

Step 10. If \&i n (Xj \ f/f )| > 10, then evaluate 7 ^ = \ ^f)- 

Otherwise, replace r,_i with Oj, and go to StepEJ 

(i) n 

Step 11. If 7k > o , then replace k with k + 1, and go to Step El 

Step 12. Set = ujp . 

Step 13. Set r 4 = 9i n C7^ . 

Step 14. For each j = 1, . . . , k, evaluate 

q( i )= f m (zf), if|r,nzf | > 10, 

J I <5, otherwise. 
Step 15. For each j = 1, . . . , k, evaluate 




k 

Step 16. Set P i+l {dx) = ^pfQf(dx). 

3=1 

Step 17. Replace i with i + and go to Step[2j 

5 Parallel Version of the Algorithm 

In many practical situations, generating of sample points and/or evaluation 
of the objective function at the points present a time-consuming procedure. 
Specifically, sampling procedures can take a lot of time when the feasible 
set is large and has a complex structure. As another illustration, one can 
consider the evaluation of an objective function as a response from a lengthy 
simulation run. 
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On the other hand, the sampling procedure intended to produce many 
sample points in a unified way can normally be split into independent rou- 
tines each turning out a part of the sample. Therefore, when the sampling 
procedure takes sufficiently more time than the other steps of the algorithm, 
one can achieve higher performance by rearranging the procedure to work 
in parallel. 

In fact, when designed to work in parallel, Algorithm 1 retains its overall 
description with the only difference being that it now performs sampling 
(Step 2 and 3) as a parallel procedure. 

6 Test Problems 

As the feasible set for test problems, we consider the set of integer vectors 
x = (xi, . . . , x n ) , where x% € {1, . . . , m) for each i = 1, . . . , n. The number 
\X\ = m n of elements in X is finite. Note however, that, in practice, it can 
be very large. 

6.1 Metrics on the Feasible Set 

Selection of a suitable metric is very important in insuring that random 
search procedures are efficient. First, the metric should effectively separate 
points which considerably differ, and group points which are similar accord- 
ing to the nature of the problem under consideration. On the other hand, the 
metric has to provide for efficient algorithms for generating random points 
according to the uniform distribution over some standard (elementary) sets 
like hyperballs or hypercubes. 

Consider the following three metrics on X : 

n 

Pi(x,y) = -S xm ), 

i=l 

P2{x,y) = max \xi -yi\, 

l<i<n 
n 

Ps(x,y) = ^2\xi-yi\, 
i=i 

where 5ij = 1, if i = j , and 5-ij = 0, otherwise. 

It is easy to determine the maximum distance between two points for 
each metric: 

max pi(x, y) = n, 

x,y(zX 

max p2(x,y) = m — 1, 
x,yex 

maxp3(x,y) = n(m — 1). 

x,y&X 
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Clearly, the metric P3 can be considered as providing for more granu- 
larity since it leads to a greater variety of values in the hyperball radius. 
Assuming n > m, the metric p\ can be ranked second with respect to the 
same property. 

Among two metrics p\ and ps the first one can offer a more simple and 
therefore more efficient sampling procedure when using hyperballs. Taking 
that into account, we use the metric p\ normally referred to as the Hamming 
distance. 

6.2 Uniform Probability Distributions 

Now we discuss how the uniform distribution over a hyperball determined 
by the Hamming distance can be modeled. First note that any hyperball 
B r (z,p) can be represented as 



where Si(z, p) = {x\p(z, x) = i} is a hypersphere for each i = 0, 1, . . . , r . 

The sampling over a hyperball can be arranged as a two-stage proce- 
dure: (i) a hypersphere in the hyperball is selected according to some dis- 
tribution over the hyperspheres, and then (ii) the uniform distribution on 
the hypersphere is modeled. Clearly, the probability assigned to a particu- 
lar hypersphere must be proportional to the total number of points on the 
hypersphere. 

Since any hypersphere of radius i contains 



points, the random selection of a hypersphere in a hyperball of radius r can 
be performed as follows. 

Algorithm 2. 

Step 1. Fix n, m, and r <n. 

Step 2. For each i = 0, 1, . . . ,r , evaluate 



B r {z, p) = S (z, p)US 1 {z,p)ll---U S r (z, p), 





Step 3 



Set TV = N + JVi + • • • + N r . 



Step 4 



For each i = 0, 1, . . . , r , evaluate 
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Step 5. Get a random number u from the uniform distribution over [0, 1] . 
Step 6. As the radius of the hypersphere, take j = min{z|Pj > u} . 

Upon selection of a hypersphere S r (z,p), one can generate a point x = 
(x±, . . . , x n ) on the hypersphere according to the uniform distribution. 

Algorithm 3. 

Step 1. Set 4 = 1, x = z, M = {1, ... , m}, and Nq = {1, . . . , n}. 

Step 2. Get a random integer j from the uniform distribution over iVj_i . 

Step 3. Set N i = N i - 1 \{j}. 

Step 4. Get a random integer k from the uniform distribution over M\{zj} . 
Step 5. Set Xj = k. 

Step 6. Replace i with 4 + 1. If i < r then go to StepEJ 
6.3 Test Functions 

To test both serial and parallel versions of the algorithm, simple unimodal 
and multimodal functions with the known global minimum are considered 
(see, e.g., [3] for more examples). We assume the functions to be defined on 
the set X = {(xi, . . . ,x n )\xi £ {1, . . . ,m}, 1 < i < n}, provided that m is 
even, and m < n. 

First, we consider an integer analog of the De Jong's function: 

n 

/(x) = ^(x l -m/2) 2 . (3) 

i=l 

As it is easy to see, the function is unimodal with the minimum f(x*) = 
achieved at the point x* = (m/2, . . . , m/2) . 

The following integer function is of the Rastrigin type: 

n 

f(x) = nm + \{xj — m/2) 2 — mcos(k-K(xj — m/2)/m) , (4) 

8=1 

where k is an integer parameter. 

If k = 0, the function coincides with De Jong's function, and it is uni- 
modal. As k increases, the function becomes multimodal. 

It has the global minimum /(x*) = 0, where x* = (m/2, . . . ,m/2). 

The function 

n n—l 

f( x ) = \ x i ~ m / 2 \ + ^2 \ Xi ~ + ' Xrl ~ Xl ' ( 5 ) 

i=l i=l 
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has a local minimum /(x* ) = n\i — m/2| at each point xi = (i, ... 
where i = 1, . . . , m; and the global minimum /(x*) = that is achieved at 
= (m/2, . . . , m/2) . 

Clearly, the optimization problems with these functions can immediately 
be solved analytically, and, in fact, do not call for any sophisticated compu- 
tational procedures. However, they could provide the basis for a preliminary 
performance analysis of the algorithm and for a prediction of its behavior 
when solving actual problems. 

7 Computational Experience 

Now we turn to the discussion of practical implementation of both serial 
and parallel algorithms, including test results and performance analysis. 

7.1 Software and Hardware Support 

To investigate the performance, both serial and parallel versions of the al- 
gorithm were coded in C++ under the Linux RedHat 8.0 operating system. 
The parallel code is based on LAM 6.5.9. implementation [8] of the Message 
Passing Interface (MPI) communication standard [9]. 

The parallel application consists of two modules; first one intended to run 
on the master computer, and the second designed to support slave comput- 
ers. The code running on the master controls the communication with the 
slaves, and performs all the steps of the algorithm except for the sampling 
procedure. 

The master computer starts operating by establishing connections and 
broadcasting some general information, including the parameters n and m 
of the feasible set, among the slave computers. At each iteration of the 
algorithm, it sends requests to all slaves to produce samples. The request to 
a particular slave includes the current radius of hyperballs, and its own list 
of hyperball centers accompanied by the numbers of points to be generated 
in each hyperball. 

The sample points and their related values of the function are sent back 
to the master. Upon completion of the current iteration, the next iteration 
is initiated until the stop condition is met. 

The software was tested on a cluster of Intel Pentium II/ 500MHz/ 
128Mb RAM/ 10Gb HDD computers with 100BaseTX 100Mbit LAN. 

7.2 Serial Algorithm Analysis and Tests 

We begin with the results of testing a serial version of the algorithm code, 
which actually does not include any MPI support. A series of test runs 
were performed with the test functions defined on the feasible set X with 
n = 200, m = 50. The low bound 5 was set to 0.1. 
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By performing the tests, we first try to understand how both K and R 
can affect the time the algorithm takes to find the solution. 

Note that for a large sample sizes K , the time to produce and utilize one 
sample becomes quite significant, and generally leads to increased total time. 
On the other hand, one can expect that a large K provides for more accurate 
statistical procedures that could reduce the overall number of samples, and 
hence the total solution time. 

Clearly, for smaller initial radius R, the number of algorithm steps with 
the current radius r > 1 should decrease. In fact, a small R allows the 
algorithm to be more concentrated around the best points found in early 
steps. As this takes place, one can expect to reduce the total time when 
solving problems with simple unimodal objective functions. However, for 
more complicated multimodal functions, this time could become even larger 
because of a possible rise in the number of examined points, especially at 
r = 1. 

The results of evaluating the total solution time for the test functions 
with K being varied from 50 to 300 and R from 10 to 190 show that on 
average the algorithm takes less time when both K and R are within the 
range from 50 to 100. 

Let S be the time spent generating the samples and evaluating the 
function (sampling time), and A be the time the algorithm takes to utilize 
the samples (algorithm time). The total solution time of the algorithm can 
be represented as 

T s = S + A. (6) 

Finally, let us denote the total number of sample points examined by 
the algorithm during solution process, as N, and define Si = S/N and 
A\ = A/N to represent average sampling and algorithm time for one sample 
point. 

In Table [TJ we present a brief summary of the test results for the serial 
algorithm for each test function. The summary actually includes the average 
times and numbers of examined points, calculated over the entire series of 
test runs. 



Test 




Run time 


Point time 


func- 


N 




(sec.) 




(msec.) 


tion 




T s 


S 


A 


Si 


A x 


© 


132994 


212 


147 


66 


1.10 


0.50 


© 


129270 


675 


554 


121 


4.29 


0.94 


© 


199244 


914 


343 


570 


1.72 


2.86 



Table 1: Summary results for the test runs. 
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7.3 Parallel Algorithm Analysis 



The total time the parallel algorithm takes to get the solution can be written 
as 

Tp = S/ P + A + C, (7) 

where p is the number of slaves, C is the time the master spends on trans- 
mission of control/sample data to/from slaves (communication time). 

Clearly, with ([6|) and (|7J) the speedup the parallel algorithm can achieve 
using one master and p > 1 slave computers, can be represented as 

, s T s S + A 



T P S/p + A + C 

Let us denote the average data transmission time for one sample point 
as C\ . Assuming the amount of control data the master sends to be well 
below that of the sample data it receives, one can expect C\ ~ C/N . Now 
we can write 

ff W * oVAf (8) 

bi/p + A\ + C\ 

With (jSJ) one can examine the conditions required for the parallel algo- 
rithm to achieve a true speedup, and estimate actual speedup in particular 
problems. Specifically, in order to get a speedup a > 1, one should have 

S i > P 



Ci p-1 

If the algorithm time A\ appears to be much less than both the sampling 
time Si and the communication time C\ , we have the speedup 

-/ s Si rp 

a{p) 



Sx/p + Ci r+p 

Since at a fixed r it holds that a{p) — > r as p — > oo, one can see that r 
presents the maximum asymptotic speedup of the parallel algorithm. 

Note, however, that the actual speedup can be much lower than r. It 
depends on the value of A\ , and approaches 1 when A\ becomes sufficiently 
large. In addition, the simplified model ([7]) does not take into account 
many of the details of the actual network communication process, which 
could affect the speedup adversely, especially when the level of parallelism 
increases. 



7.4 Parallel Algorithm Tests 

To evaluate expected speedup, we need an estimate of the average commu- 
nication time C\ . 

Considering that with n = 200 , the data length for one point comprises 
408 bytes, including 2n = 400 bytes for the integer components of the related 
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vector, and 8 bytes for the function value. Our computational experience 
shows that the average time to transmit the point data is approximately 
equal to 1 millisecond. 

With C\ ~ 1 , and parameters S\ and A\ taken from Table [H one can 
apply ([E]) to evaluate the speedup for any p > 1 (see Fig. [1]) . 

Speedup 



3.0- 




m 


2.0- 






1.0- 


/ 

/ 








0.0- 







5 10 15 

Number of Slaves 



Figure 1: Predicted speedup for the test functions. 

As it easy to see, one can expect an actual speedup only for the function 
^1 having the best value of r = S\/C\ ~ 4.29. For the other functions, 
any sufficient speedup can hardly be achieved because of the low level of 
r = 1.10 for ([3]), and a high magnitude of A\ = 2.86 for ([5]). 

In order to evaluate actual speedup for function several series of test 
runs were performed for each K = 50,100,150, and p = 1,2,3,4,5. One 
series involves a particular run for every value of R varied from 50 to 150 
by 10. The average total solution time over the values of R for each series 
is represented in Fig. [2j where p = corresponds to the serial version of the 
algorithm. 

One can see that the best speedup achieved was about 1.6-1.7 when us- 
ing one master and 3 slave computers. Although the speedup appears to be 
relatively small, it does demonstrate the potential of parallelization. Since 
evaluation of the functions involves only a few operations, the sampling 
procedure does not take much time to produce samples. As the perfor- 
mance analysis shows, if this procedure is time-consuming, one can expect 
to achieve even greater efficiency. 
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